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Abstract — In this paper, we propose an algorithm that targets 
contamination and eavesdropping adversaries. We consider the 
case when the number of independent packets available to the 
eavesdropper is less than the multicast capacity of the network. 
By means of our algorithm every node can verify the integrity 
of the received packets easily and an eavesdropper is unable 
to get any "meaningful information"about the source. We call 
it "practical security"if an eavesdropper is unable to get any 
meaningful information about the source. We show that, by giving 
up a small amount of overall capacity, our algorithm achieves 
achieves the practically secure condition at a probability of one, 
which is much higher than that of Bhattad and Narayanan's 1 1 1. 
Furthermore, the communication overhead of our algorithm are 
negligible compared with previous works, since the transmission 
of the hash values and the code coefficients are both avoided. 

I. Introduction 

The concept of network coding was first introduced by 
Ahlswede et al. Q. They showed that multicast rates could 
be increased by allowing for network coding instead of just 
routing. Shortly afterwards, Li, Yeung and Cai Q showed that 
it is sufficient for the encoding functions at the interior nodes 
to be linear. Ho et al. |H) and |0 proposed a random coding 
scheme in which the message on outgoing edges of a node 
are chosen to be a random linear combination of the message 
on its incoming edges. 

In reality, network transmission may suffer from two kinds 
of adversaries: contamination and eavesdropping. Network 
coding has been studied to con-quer these two kinds of 
adversaries. Ho et al. |6) considered the problem of network 
coding in the presence of Byzantine attacker. Gkantsidis et 
al. also considered the related problem. Jaggi et al. 
[8 1 designed a resilient network coding algorithm which is 
information-theoretically secure and rate-optimal for different 
adversarial strengths. Homomorphic hashing function was first 
proposed in J5), which allows nodes to check blocks on-the- 
fly in a system where content is encoded at the source using 
rateless codes. However, the total size of the hash values of 
their scheme is proportional to the number of blocks, which 
could be very large and the cryptographic hash function is 
computationally expensive. Li et al. [ 1 1 employed a batch 
content distribution verification scheme, which reduced the 
computational cost of each node to cache and scan all the 
received packets when computing a new packet. The cryp- 
tographic hash function of their scheme is computationally 



inexpensive compared with which in [9|. Unfortunately, their 
scheme deviate from the classical network coding scheme, 
which is bandwidth consumed and delay could be induced at 
the sinks. On the other hand, although batching can decrease 
the computation time, batching block verification has the risk 
of letting some malicious packets propagate since packets are 
exchanged without being checked. Thus, standard batching 
techniques do not work well with network coding. Zhao et 
al. ifTTI presented a signature scheme with low computation, 
but their scheme required long start-up latency. Finally, all 
the works presented above have to distribute the coefficients 
which is bandwidth consumed. 

Cai and Yeung lfT2ll considered the problem of using net- 
work coding to achieve perfect information security against 
an eavesdropper who can eavesdrop on a limited number 
of network links, and presented the construction of a secure 
linear network code for this purpose. A similar problem was 
considered in ifFJl featuring a random coding approach in 
which only the input vector is modified. 

Bhattad and Narayanan [ 1 1 first defined a model for security 
that is more suitable for practical applications. In this paper, 
we also consider this type of model, which is not information 
theoretically secure, but is secure enough for the application. 
An interesting observation made in lfT4ll was that for a compu- 
tation limited eavesdropper with the use of one way function it 
is possible to transmit at a high rate without the eavesdropper 
getting any meaningful information about the source. A more 
general threat posed by intermediate nodes was considered in 
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In this paper, we consider these two kinds of adversaries 
at the same time, that is, the adversary can contaminate the 
transmission on a subset of channels, and at the same time 
eavesdrop on another subset of channels with cardinality less 
than or equal to the multicast capacity of the network. Ngai 
and Yang [16| studied the similar problem and constructed a 
secure error-correcting network codes. 

The main contribution of this paper is to propose an 
algorithm, which can not only verify the integrity of the 
received packets easily but also achieve the practically secure 
condition at a probability of one. In our scheme, we use the 
public parameters as the "intended"hash values. The original 
packets are padded so that they are hashed to the public 
parameters. In this way the transmission of the hash values 



is avoided. The code coefficients in our scheme are generated 
in a pseudo-random number generator in each node, so the 
distribution of the coefficients is also avoided. We show 
that the communication overhead and the start-up latency are 
negligible since the transmission of the hash values and the 
coefficients are both avoided. 

This paper is organized as follows. In the next section we 
give the notations used in this paper. The secure network 
coding scheme is proposed in section III. In Section IV, we 
present the security of our algorithm. Overhead and start-up 
latency of our scheme are discussed in Section V. Finally, this 
paper is concluded in section VI. 

II. Network model and notions 

In this paper, we assume that all the messages and co- 
efficients are generated in ¥ p , where p is a large enough 
prime number, we shall use small letters x,y etc. to denote 
vectors whose dimensions will be clear from the context. The 
matrices are denoted by the capital letters such as X, Xetc. The 
transpose operator of vectors and matrices will be denoted by 
"T"thus x T will stand for column vectors. 

A. Network Model 

We represent a network by a directed graph G = (V;E), 
where V is the set of vertices (nodes) and E is the set of edges 
(channels). We assume an order on V which is consistent with 
the associated partial order on G. A network code is said to 
be linear if the message on any outgoing edge of any node is 
a linear combination of the messages on the incoming edges 
of the node. 

In this paper, we assume that the source node sends infor- 
mation X of the following form: 
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Therefore, for a linear code the message on edge Bj G E 
can be written as .F e X where T e . is a length m vector over 
F p (we call it global encoding kernel in this paper ) on edge 
e, G E. 

B. Threat Model 

There is a source, Alice, and a destination, Bob, who 
communicate over a wired or wireless network. There is also 
an eavesdropper Calvin, hidden somewhere in the network. He 
aims to eavesdrop on the transfer of information from Alice 
to Bob and injects his own. A malicious node can generate 
corrupted packets and then distribute them to other nodes, 
which in turn use them to (unintentionally) create new encoded 
packets that are also corrupted. A wiretap network is specified 
by a collection A of sets of edges A = {Ai, A2, ■ ■ ■ , A\a\}, 
Ai G E. Calvin selects a particular set Ai G A and listens 
to all messages transmitted on edges in Ai to get some 




Fig. 1. Networks 



information. We assume that the set doesn't change with time. 
When we are specified a linear code and a wiretap network we 
use Ai to represent a matrix whose rows contain all linearly 
independent global encoding kernel corresponding to edge 
Bj G Ai. In this case, the messages available to Calvin is 
A,X. The number of rows in A, is represented by fcj. We 
define k as maxiki. 

C. Notions 

1) : The network capacity is the time-average of the 
maximum number of packets that can be delivered from Alice 
to Bob, assuming no adversarial interference, i.e., the max- 
flow. To simplify notion, in this paper, we assume the max- 
flow from Alice to Bob is m. 

2) : Practical security: Consider a set of messages M. Let 
U be subset of the set containing the multicast information X. 
We say that M has no information about U if I(U; M) = 0. 
We say that M has no meaningful information about U if 
I(xi;M) = 0,Vxi G U. In this paper we concentrate on 
two special cases and generalize the results towards the end. 
We say that Calvin has no information about the source if 
I(X; M) = where M is the set of messages that Calvin 
chooses to observe. The security condition considered by Cai 
and Yeung lfl2ll falls in this category. We will use Shannon 
security to refer to this security requirement. The second case 
we consider is when Calvin gets no meaningful information 
about the source i.e./(xi;M) = 0,Vxi for messages M 
observed by Calvin. We call this type of security as practical 
security. 

It is noted that if Alice transmits a linear transformation 
of X, PX, instead of X then the message transmitted on 
edge ej would be .F e .PX(P is a to x to matrix which 
is unknown to Calvin). In this case, although Calvin has 
some information about the source he is unable to get any 
meaningful information. 

As shown in Fig.l, let us assume that Calvin can listen to 
any one edge of this network. The multicast capacity for this 
network is 2. xi and x 2 are the messages of Alice. In Fig.l (a), 
w is a uniform random sequence independent of the messages. 
This is an example of the coding scheme constructed by Cai 



and Yeung [12|. Obviously, the maximum multicast capacity 
supported is 1 when this system has to be Shannon secure. 
When the security condition is relaxed to practical security, as 
shown in Fig.l (b), the max-flow can be achieved. 

III. Secure network coding 

A. The Homomorphic Hash Function 

We first choose the hash parameters q,g. Let o{x) denote 
the order of x in the field ¥ q . Here we choose o(g) = p in 
F g (F p is a subfield of F 9 ). Furthermore we randomly select 
n + 2 numbers Uq,u\,- ■ ;U n ,u n +i from F p . Next, we compute 
9i = 9 Ui (mod q) for all < i < n + 1. The public 
parameter of the hash function is p,q,go,gi,- ■ -,g n +i- Whereas 
uo, Mi,- • -,u n+ i and g should be kept secret. 

Formally, we define DL[g,p,q] to be the computational 
problem: Given y, g and q, where o(g) — p in ¥ q , find x 
such that y = g x (mod q). Hence, we have 

Lemma 1: Given ga,gi,- ■ -,g n +i> an d the public parame- 
ters p,q, it is computationally infeasible for a node to find 
Mo, Mi,- • -,u n+1 , such that g t = g Ui (mod q) if DL[g,p, q] is 
hard. 

Assume that each message is of the form: 
x = (xq,xi,- ■ -,x n , r) where Xi,r £ F p for < i < n, 
and the hash of x is computed as 
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Based on this construction, we have 



St (x) = g 



= o (Ei=0 »i :E i+ u n+l r ) (mod p) 



(mod q) (3) 



For any two messages x = [xq, x\, ■ ■ ■ , x n) ri) and 
y = (l/o ; J/l) ' ' ' > Vni r 2), we define the addition of x and y as 



x + y = (zo,zi,-'-,Zn,r) 



(4) 



where r = (rx + ri) (mod p) and z% — (xi + yi) 
(mod p) for < i < n 

Hence, this hash function has the following hommorphic 
property 



JT(x)^(y) = Jf(x + y) 



(5) 



The security of M' is defined in terms of the difficulty in 
finding collisions. It can be shown that the hash function is 
indeed collision free if DL[g,p, q] is hard. In particular we 
have: 

Lemma 2: The hash function Jf? is collision-free (namely 
it is computationally infeasible to find two different messages 
xi and x 2 such that ^(xi) = J^f(x 2 ) if DL[g,p,q] is hard. 

It can be proved that the hash function is indeed collision- 
free, using an argument in ifTTl (proof of Theorem 3.4). 



B. Alice's Encoder and Bob's Decoder 

Alice's encoder : Alice encodes X in the following steps. 
She first chooses m parity symbols r^, for d G {1, • • ■ ,771} 
uniformly at random from the field ¥ p and then generates a 
Vandermonde matrix P as follows 
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In the second step, Alice per-multiplies the source message 
X with P 
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In the third step, Alice adds n , r%, . . . , r m to X and gets 
X as follows 
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Alice uses ^,1 < i < m (it is possible since any 
practical network coding system would make m <C n ) as the 
"intended"hash values of x x , x 2 , . . . , x m . A list of padding 
£10 j X20, ■ ■ • , x m o can be computed. We add the padding to 
every packet and get X as show in Fig 1. 

When a message packet x i is padded with x iQ to form the 
new packet Xi, then Jff(x~i) = gi, 1 < i < m. 

Bob's decoder : Bob first decodes X and gets 
r%, V2, ■ ■ ■ , r m . The Vandermonde matrix P can be computed 
from n, ?"2, . . . , r m . Bob then per-multiplies the associated 
matrix PX with P _1 and gets the original packet X. 

C. The Basic Verification Scheme 

As show in Fig.l, It is noted that the message can only 
be padded using the secret key that is known only by Alice. 
Next, Alice chooses a seed c and feed it to a pseudo-random 
generator G. Instead of choosing the coefficients, the source 
uses the random numbers ci,c 2 , . . . ,c m generated by G as the 
"intended"coefficients. Since the coefficients can be computed 
from the public function G, there would be no need to 
distribute the coefficients, and it suffices if all the nodes know 
c. 

Our proposed scheme consists of two algorithms, namely 
the encoding algorithm and the verification algorithm. 

Encoding Algorithm : The encoder performs the fol- 
lowing steps 

1) Choose a random seed c. 




Fig. 2. The basic verification scheme 



2) Generate pseudo-random numbers C\,c<z,- ■ -,c m from G 
with c. 

3) For each 1 < i < m, choose gi — g Wi (mod q) as the 
"intended"hash values. 

4) For each 1 < i < m, compute xm — {ui — 2~2j=i x ij u j ~ 
u n+1 r}v,Q 1 (mod p). 

5) Let X = (x!,x 2 , • ■ ' ,x m ) T , where ^ = 
(x i0 , xn, ■ ■ ■ , x in , ri) for all 1 < i < m. 

6) Output x, c and the public parameter go, gi, ■ ■ ■ , g n +i 
p, q. Where x is the linear combination x=J^!\ CjXj. 

Verification Algorithm : During verification, each node 
is given a packet x and public information t. In the case 
where this packet is not tampered with, x is the linear 
combination x=^™ =1 Cjici, and t represents public parameters 
9o,9i,- ■ ■ ,9n+i,P,q and c. 

Each node can verify the integrity of the packet as follows 

1) From c, compute c\,ci,- ■ -,c m 

2) Compute the hash value M\ — J^(x). 

3) Compute the hash value = nZLi ^» C< (mod q), 
(hi = gi, for 1 < i < m) 

4) Verify that M[ = ,J^ 2 

In our scheme, every node selects and distributes random 
values to all its following nodes instead of the transmission of 
the coefficients. The coefficients are generated from a shared 
pseudo-random number generator in each node and the global 
encoding kernel can be calculated recursively in any upstream 
to downstream order. 

In practice, the need for distributing random values can 
be further eliminated by using a public random function. For 
example, it can be the SHA-1 hash of the original file identifier, 
creation date, publisher, and other data that are public and 
should be known to all the receivers before the download 
session begins. 

Our verification scheme enables the nodes to check the 
integrity of packets without the requirement for a secure 
channel. Also, the computation involved in the hash values 
generation and verification processes is very simple. 



IV. Security of our algorithm 

A. Security against the contamination adversaries 

It can be shown that the basic verification scheme is indeed 
secure if DL[g,p, q) is hard, using an argument similar to that 
in 1 1 1 (proof of Theorem 3). 

Theorem 1: It is computationally infeasible to find X = 
(xi, X2, • • • , x m ) T , y and c = (c\,C2,- ■ -,c m ) such that for 
x= X)I=i C A, we have y ^ x and Jf(x) ~ ,yf(y), namely 
the basic scheme is secure if DL[g,p,q] is hard. 

Proof: We prove this theorem by showing that if there is a 
polynomial time algorithm A that finds X = (xi, X2, • ■ • , x m ) T 
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have y ^ x and Jf(x) = Jff (y)) with probability p that 
is not negligible, we can use it to construct a polynomial time 
algorithm B that find a collision x and y in J$? with the same 
probability p which is not negligible. 

However, if DL[g,p, q] is hard, Lemma 2 show that the hash 
function Jff is collision free, and thus p should be negligible, 
which is a contradiction. Therefore, the basic schemes are 
secure if the discrete logarithm DL[g,p, q] is hard. ■ 

B. Security against the eavesdropping adversaries 

Theorem 2: Given a network that the number of indepen- 
dent message available to Calvin is less than the multicast 
capacity i.e.fc = maxiki < m. The algorithm in section-B 
achieves the practically secure condition at a probability of 
one when random code is used. 

Proof: In our algorithm, Alice transmits X instead of X, 
so the message available to Calvin is A^X. As long as Calvin 
doesn't get r\,T2, . . . ,r m from A^X, he can't get the global 
encoding kernel about X, and still can't get any meaningful 
information about X without the global encoding kernel about 
X. So by taking linear combinations of the observed packets 
A^X Calvin shouldn't be able to recover r±, . . . , r m which 
implies 

M-iX^/X^/M) (9) 

where bi is a hi x m matrix in F p ™ and J is an m X m 
identity matrix. 

Since the number of independent messages available to the 
eavesdropper is less than the multicast capacity of the network, 
the condition (9) can always be satisfied. ■ 

Moreover, Calvin can't get any packet of X by only getting 
the value r\ , r%, . . . , r m which implies 



6jAjP ^ I m<n (Vbi,n,i) 
Multiplying both sides by P _1 , we have 



(10) 



(11) 



Where /„ l n is the n th row of an m x m identity matrix and 
bi is a ki x m matrix in F p ™ . The above condition is satisfied 
if each row of P^ 1 is not in the row space of each A;. 
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Theorem 3: In a network that supports a multicast capacity 
of to, if at most k (k < to) edges can be tapped simulta- 
neously then the multicast capacity under practical security 
requirements is to — 2^ (Here, the asymptotically negligible 
term 2Hi corresponds to the overhead due to the redundancy 
Alice appends to X ). 

Proof: The network supports a multicast capacity of to so 
a linear code can be found to multicast to packets [3|. From 
theorem 2 a transformation at the source can be applied to 
make it practical security. ■ 

V. Discussion 

In this section, we examine the overhead and the start- 
up latency induced by our scheme. For fair comparisons, we 
choose n=410, \p\ = 320 and \q\ = 1024 in the follow 
discussion. The size of every packet is 16KB. What's more, 
we assume that the original file is divided into m=10 packets. 

The comparisons are shown in Fig. 3 

A. Communication Overhead 

The communication overhead is caused by two parts of 
parameters. The first part refers to the amount of data we need 
to distribute to each node for the security of our scheme. The 
second part is the code coefficients. The actual communication 
overhead largely depends on the parameters chosen for the 
actual implementations. 

In the scheme proposed by Krohn (9j, the parameters chosen 
for the homomorphic hash function would generate a hash 
value of size 1024 bits per packet. The total size of coefficients 
is 3200 bits per packet. Hence, the total size of the "first- 
order"hash values and the coefficients would be 3.22% of the 
original data. For a file of size 1 GB, their method would 
require hash values of size 8 MB. To distribute these hash 
values, the authors in Krohn |9| proposed to recursively apply 
the same scheme on the 8 MB hash values, which would 
generate more "second "or higher order of hash values. The 
size of the high order hash values constitutes 0.01% of the 
size of the original data. Hence the total overhead is 3.23%. 

In the scheme proposed by Zhao [11|, if the file is divided 
into 10 packets, each packet is a vector in ¥ q . The size of 
each packet is also about 16KB. The size of each augmented 
vector (with coding vectors in the front) is about 16.4KB, 
and thus, the overhead of each packet is 2.43%. On the other 
hand, after the initial setup, the scheme of [11] has to publish 
3200 bits for the new signature vector for the security of their 
scheme. Thus the total overhead of their scheme is 4.86%. In 



conclusion, although they proposed a simple signature scheme, 
the communication overhead of their scheme is very high. 

The scheme in iflOl required padding of three values and 
they should also distribute the coefficients. The overhead 
caused by the coefficients themselves is 2.44%. Therefore the 
communication overhead of their scheme is higher than us, 
although they use the technical of batching verification. In our 
scheme, the coefficients are generated from a pseudo-random 
number generator in each node so the distribution of them is 
avoided. 

The communication overhead of our scheme is only caused 
by padding we add in every packet. Each packet distributed 
only incurs 0.48% overhead which is negligible compared with 
previous works. Formally, Let the file size be S and each 
one of which is a vector in ¥ p . The size of each vector is 
B = nlog(p) and we have S — mrilog(p). The size of each 
augmented vector (with the padding in the front and the back) 
is B a = (n + 2)log(p), and thus, the overhead of the packet is 
— times the file size. Note that the communication overhead 

n 

of our scheme is asymptotically negligible. 

B. Start-Up Latency 

At the beginning of a content distribution session, the source 
and all the nodes participating in the distribution have to agree 
on the set of parameters used for the coding and verification. 
The public parameters in our scheme are p,q,go,gi,- ■ -,g n +i 
and the total size of the public parameters is approximately 
16.3 KB. With these parameters it would be sufficient for any 
node to perform verification. Assuming that the bandwidth 
between a node and the source (or any other node from which 
these parameters are distributed) is 1 Mbps, it would take 
less than 0.127 seconds before the node is ready to perform 
verification. The start-up latency in our scheme is fixed once 
the parameters for the hash function and the block size are 
chosen, and is independent of the size of the content to be 
distributed. The start-up latency of 1 1 ] is 0.127 seconds, more 
or less the same with us. For the scheme in (9), the size of 
all the public parameters is the same as the size of the data 
in a packet, which is 16 KB. It takes 0.125 seconds to be 
transmitted on the same link. However, when the node needs 
to receive 8 MB hash values of a 1 GB file as in the example 
given in [|9|, it would require 64 seconds, with the same 1 
Mbps link. The start-up latency is proportional with the size 
of the file. The public parameters of ifTTj consist of two parts: 
the public parameters and the signature vectors. The size of 
their public parameters is 32.8KB and it takes 0.25 seconds 
to be transmitted on the same link. Father more they have to 
publish new signature vectors for the security of their scheme 
in every setup. 

VI. Conclusion 

In this paper, we investigate the security issues that arise 
from using network coding and propose a secure algorithm. 
By means of our algorithm every node can verify the integrity 
of the received packets easily and an eavesdropper is unable 
to get any meaningful information about the source. We show 



that when we give up a small amount of overall capacity, the 
practically secure condition can be achieved at a probability 
of 1, which is much higher than that of [1]. We also propose a 
new paradigm where the public parameters are selected as the 
"intended"hash values and the code coefficients are generated 
in a pseudo-random number generator in every node. In this 
way the distribution of the hash values and the coefficients are 
avoided. We have shown that the communication overhead of 
our algorithm is — , which is negligible compared with previous 
works and the start-up latency is transitory. 
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Abstract — In this paper, we propose an algorithm that targets 
contamination and eavesdropping adversaries. We consider the 
case when the number of independent packets available to the 
eavesdropper is less than the multicast capacity of the network. 
By means of our algorithm every node can verify the integrity 
of the received packets easily and an eavesdropper is unable 
to get any "meaningful information"about the source. We call 
it "practical security"if an eavesdropper is unable to get any 
meaningful information about the source. We show that, by giving 
up a small amount of overall capacity, our algorithm achieves 
achieves the practically secure condition at a probability of one, 
which is much higher than that of Bhattad and Narayanan's 1 1 1. 
Furthermore, the communication overhead of our algorithm are 
negligible compared with previous works, since the transmission 
of the hash values and the code coefficients are both avoided. 

I. Introduction 

The concept of network coding was first introduced by 
Ahlswede et al. Q. They showed that multicast rates could 
be increased by allowing for network coding instead of just 
routing. Shortly afterwards, Li, Yeung and Cai Q showed that 
it is sufficient for the encoding functions at the interior nodes 
to be linear. Ho et al. |H) and |0 proposed a random coding 
scheme in which the message on outgoing edges of a node 
are chosen to be a random linear combination of the message 
on its incoming edges. 

In reality, network transmission may suffer from two kinds 
of adversaries: contamination and eavesdropping. Network 
coding has been studied to con-quer these two kinds of 
adversaries. Ho et al. |6) considered the problem of network 
coding in the presence of Byzantine attacker. Gkantsidis et 
al. also considered the related problem. Jaggi et al. 
[8 1 designed a resilient network coding algorithm which is 
information-theoretically secure and rate-optimal for different 
adversarial strengths. Homomorphic hashing function was first 
proposed in J5), which allows nodes to check blocks on-the- 
fly in a system where content is encoded at the source using 
rateless codes. However, the total size of the hash values of 
their scheme is proportional to the number of blocks, which 
could be very large and the cryptographic hash function is 
computationally expensive. Li et al. [ 1 1 employed a batch 
content distribution verification scheme, which reduced the 
computational cost of each node to cache and scan all the 
received packets when computing a new packet. The cryp- 
tographic hash function of their scheme is computationally 



inexpensive compared with which in [9|. Unfortunately, their 
scheme deviate from the classical network coding scheme, 
which is bandwidth consumed and delay could be induced at 
the sinks. On the other hand, although batching can decrease 
the computation time, batching block verification has the risk 
of letting some malicious packets propagate since packets are 
exchanged without being checked. Thus, standard batching 
techniques do not work well with network coding. Zhao et 
al. ifTTI presented a signature scheme with low computation, 
but their scheme required long start-up latency. Finally, all 
the works presented above have to distribute the coefficients 
which is bandwidth consumed. 

Cai and Yeung lfT2ll considered the problem of using net- 
work coding to achieve perfect information security against 
an eavesdropper who can eavesdrop on a limited number 
of network links, and presented the construction of a secure 
linear network code for this purpose. A similar problem was 
considered in ifFJl featuring a random coding approach in 
which only the input vector is modified. 

Bhattad and Narayanan [ 1 1 first defined a model for security 
that is more suitable for practical applications. In this paper, 
we also consider this type of model, which is not information 
theoretically secure, but is secure enough for the application. 
An interesting observation made in lfT4ll was that for a compu- 
tation limited eavesdropper with the use of one way function it 
is possible to transmit at a high rate without the eavesdropper 
getting any meaningful information about the source. A more 
general threat posed by intermediate nodes was considered in 

urn 

In this paper, we consider these two kinds of adversaries 
at the same time, that is, the adversary can contaminate the 
transmission on a subset of channels, and at the same time 
eavesdrop on another subset of channels with cardinality less 
than or equal to to. Ngai and Yang [16] studied the similar 
problem and constructed a secure error-correcting network 
codes. 

The main contribution of this paper is to propose an 
algorithm, which can not only verify the integrity of the 
received packets easily but also achieve the practically secure 
condition at a probability of one. In our scheme, we use the 
public parameters as the "intended"hash values. The original 
packets are padded so that they are hashed to the public 
parameters. In this way the transmission of the hash values 



is avoided. The code coefficients in our scheme are generated 
in a pseudo-random number generator in each node, so the 
distribution of the coefficients is also avoided. We show 
that the communication overhead and the start-up latency are 
negligible since the transmission of the hash values and the 
coefficients are both avoided. 

This paper is organized as follows. In the next section we 
give the notations used in this paper. The secure network 
coding scheme is proposed in section III. In Section IV, we 
present the security of our algorithm. Overhead and start-up 
latency of our algorithm are discussed in Section V. Finally, 
this paper is concluded in section VI. 

II. Network model and notions 

In this paper, we assume that all the messages and co- 
efficients are generated in ¥ p , where p is a large enough 
prime number, we shall use small letters x,y etc. to denote 
vectors whose dimensions will be clear from the context. The 
matrices are denoted by the capital letters such as X, Xetc. The 
transpose operator of vectors and matrices will be denoted by 
"T"thus x T will stand for column vectors. 

A. Network Model 

We represent a network by a directed graph G = (V;E), 
where V is the set of vertices (nodes) and E is the set of edges 
(channels). We assume an order on V which is consistent with 
the associated partial order on G. A network code is said to 
be linear if the message on any outgoing edge of any node is 
a linear combination of the messages on the incoming edges 
of the node. 

In this paper, we assume that the source node sends infor- 
mation X of the following form: 
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Therefore, for a linear code the message on edge Bj G E 
can be written as .F e X where T e . is a length m vector over 
F p (we call it global encoding kernel in this paper ) on edge 
e, G E. 

B. Threat Model 

There is a source, Alice, and a destination, Bob, who 
communicate over a wired or wireless network. There is also 
an eavesdropper Calvin, hidden somewhere in the network. He 
aims to eavesdrop on the transfer of information from Alice 
to Bob and injects his own. A malicious node can generate 
corrupted packets and then distribute them to other nodes, 
which in turn use them to (unintentionally) create new encoded 
packets that are also corrupted. A wiretap network is specified 
by a collection A of sets of edges A = {Ai, A2, ■ ■ ■ , A\a\}, 
Ai G E. Calvin selects a particular set Ai G A and listens 
to all messages transmitted on edges in Ai to get some 




Fig. 1. Networks 



information. We assume that the set doesn't change with time. 
When we are specified a linear code and a wiretap network we 
use Ai to represent a matrix whose rows contain all linearly 
independent global encoding kernel corresponding to edge 
Bj G Ai. In this case, the messages available to Calvin is 
A,X. The number of rows in A, is represented by fcj. We 
define k as maxiki. 

C. Notions 

1) : The network capacity is the time-average of the 
maximum number of packets that can be delivered from Alice 
to Bob, assuming no adversarial interference, i.e., the max- 
flow. To simplify notion, in this paper, we assume the max- 
flow from Alice to Bob is m. 

2) : Practical security: Consider a set of messages M. Let 
U be subset of the set containing the multicast information X. 
We say that M has no information about U if I(U; M) = 0. 
We say that M has no meaningful information about U if 
I(xi;M) = 0,Vxi G U. In this paper we concentrate on 
two special cases and generalize the results towards the end. 
We say that Calvin has no information about the source if 
I(X; M) = where M is the set of messages that Calvin 
chooses to observe. The security condition considered by Cai 
and Yeung lfl2ll falls in this category. We will use Shannon 
security to refer to this security requirement. The second case 
we consider is when Calvin gets no meaningful information 
about the source i.e./(xi;M) = 0,Vxi for messages M 
observed by Calvin. We call this type of security as practical 
security. 

It is noted that if Alice transmits a linear transformation 
of X, PX, instead of X then the message transmitted on 
edge ej would be .F e .PX(P is a to x to matrix which 
is unknown to Calvin). In this case, although Calvin has 
some information about the source he is unable to get any 
meaningful information. 

As shown in Fig.l, let us assume that Calvin can listen to 
any one edge of this network. The multicast capacity for this 
network is 2. xi and x 2 are the messages of Alice. In Fig.l (a), 
w is a uniform random sequence independent of the messages. 
This is an example of the coding scheme constructed by Cai 



and Yeung [12|. Obviously, the maximum multicast capacity 
supported is 1 when this system has to be Shannon secure. 
When the security condition is relaxed to practical security, as 
shown in Fig.l (b), the max-flow can be achieved. 

III. Secure network coding 

A. The Homomorphic Hash Function 

We first choose the hash parameters q,g. Let o{x) denote 
the order of x in the field ¥ q . Here we choose o(g) = p in 
F g (F p is a subfield of F 9 ). Furthermore we randomly select 
n + 2 numbers Uq,u\,- ■ ;U n ,u n +i from F p . Next, we compute 
9i = 9 Ui (mod q) for all < i < n + 1. The public 
parameter of the hash function is p,q,go,gi,- ■ -,g n +i- Whereas 
uo, Mi,- • -,u n+ i and g should be kept secret. 

Formally, we define DL[g,p,q] to be the computational 
problem: Given y, g and q, where o(g) — p in ¥ q , find x 
such that y = g x (mod q). Hence, we have 

Lemma 1: Given ga,gi,- ■ -,g n +i> an d the public parame- 
ters p,q, it is computationally infeasible for a node to find 
Mo, Mi,- • -,u n+1 , such that g t = g Ui (mod q) if DL[g,p, q] is 
hard. 

Assume that each message is of the form: 
x = (xq,xi,- ■ -,x n , r) where Xi,r £ F p for < i < n, 
and the hash of x is computed as 



(2) 



i=0 



Based on this construction, we have 



St (x) = g 



= o (Ei=0 »i :E i+ u n+l r ) (mod p) 



(mod q) (3) 



For any two messages x = [xq, x\, ■ ■ ■ , x n) ri) and 
y = (l/o ; J/l) ' ' ' > Vni r 2), we define the addition of x and y as 



x + y = (zo,zi,-'-,Zn,r) 



(4) 



where r = (rx + ri) (mod p) and z% — (xi + yi) 
(mod p) for < i < n 

Hence, this hash function has the following hommorphic 
property 



JT(x)^(y) = Jf(x + y) 



(5) 



The security of M' is defined in terms of the difficulty in 
finding collisions. It can be shown that the hash function is 
indeed collision free if DL[g,p, q] is hard. In particular we 
have: 

Lemma 2: The hash function Jf? is collision-free (namely 
it is computationally infeasible to find two different messages 
xi and x 2 such that ^(xi) = J^f(x 2 ) if DL[g,p,q] is hard. 

It can be proved that the hash function is indeed collision- 
free, using an argument in ifTTl (proof of Theorem 3.4). 



B. Alice's Encoder and Bob's Decoder 

Alice's encoder : Alice encodes X in the following steps. 
She first chooses m parity symbols r^, for d G {1, • • ■ ,771} 
uniformly at random from the field ¥ p and then generates a 
Vandermonde matrix P as follows 
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In the second step, Alice per-multiplies the source message 
X with P 
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In the third step, Alice adds n , r%, . . . , r m to X and gets 
X as follows 
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Alice uses ^,1 < i < m (it is possible since any 
practical network coding system would make m <C n ) as the 
"intended"hash values of x x , x 2 , . . . , x m . A list of padding 
£10 j X20, ■ ■ • , x m o can be computed. We add the padding to 
every packet and get X as show in Fig 1. 

When a message packet x i is padded with x iQ to form the 
new packet Xi, then Jff(x~i) = gi, 1 < i < m. 

Bob's decoder : Bob first decodes X and gets 
r%, V2, ■ ■ ■ , r m . The Vandermonde matrix P can be computed 
from n, ?"2, . . . , r m . Bob then per-multiplies the associated 
matrix PX with P _1 and gets the original packet X. 

C. The Basic Verification Scheme 

As show in Fig.l, It is noted that the message can only 
be padded using the secret key that is known only by Alice. 
Next, Alice chooses a seed c and feed it to a pseudo-random 
generator G. Instead of choosing the coefficients, the source 
uses the random numbers ci,c 2 , . . . ,c m generated by G as the 
"intended"coefficients. Since the coefficients can be computed 
from the public function G, there would be no need to 
distribute the coefficients, and it suffices if all the nodes know 
c. 

Our proposed scheme consists of two algorithms, namely 
the encoding algorithm and the verification algorithm. 

Encoding Algorithm : The encoder performs the fol- 
lowing steps 

1) Choose a random seed c. 




Fig. 2. The basic verification scheme 



2) Generate pseudo-random numbers C\,c<z,- ■ -,c m from G 
with c. 

3) For each 1 < i < m, choose gi — g Wi (mod q) as the 
"intended"hash values. 

4) For each 1 < i < m, compute xm — {ui — 2~2j=i x ij u j ~ 
u n+1 r}v,Q 1 (mod p). 

5) Let X = (x!,x 2 , • ■ ' ,x m ) T , where ^ = 
(x i0 , xn, ■ ■ ■ , x in , ri) for all 1 < i < m. 

6) Output x, c and the public parameter go, gi, ■ ■ ■ , g n +i 
p, q. Where x is the linear combination x=J^!\ CjXj. 

Verification Algorithm : During verification, each node 
is given a packet x and public information t. In the case 
where this packet is not tampered with, x is the linear 
combination x=^™ =1 Cjici, and t represents public parameters 
9o,9i,- ■ ■ ,9n+i,P,q and c. 

Each node can verify the integrity of the packet as follows 

1) From c, compute c\,ci,- ■ -,c m 

2) Compute the hash value M\ — J^(x). 

3) Compute the hash value = nZLi ^» C< (mod q), 
(hi = gi, for 1 < i < m) 

4) Verify that M[ = ,J^ 2 

In our scheme, every node selects and distributes random 
values to all its following nodes instead of the transmission of 
the coefficients. The coefficients are generated from a shared 
pseudo-random number generator in each node and the global 
encoding kernel can be calculated recursively in any upstream 
to downstream order. 

In practice, the need for distributing random values can 
be further eliminated by using a public random function. For 
example, it can be the SHA-1 hash of the original file identifier, 
creation date, publisher, and other data that are public and 
should be known to all the receivers before the download 
session begins. 

Our verification scheme enables the nodes to check the 
integrity of packets without the requirement for a secure 
channel. Also, the computation involved in the hash values 
generation and verification processes is very simple. 



IV. Security of our algorithm 

A. Security against the contamination adversaries 

It can be shown that the basic verification scheme is indeed 
secure if DL[g,p, q) is hard, using an argument similar to that 
in 1 1 1 (proof of Theorem 3). 

Theorem 1: It is computationally infeasible to find X = 
(xi, X2, • • • , x m ) T , y and c = (c\,C2,- ■ -,c m ) such that for 
x= X)I=i C A, we have y ^ x and Jf(x) ~ ,yf(y), namely 
the basic scheme is secure if DL[g,p,q] is hard. 

Proof: We prove this theorem by showing that if there is a 
polynomial time algorithm A that finds X = (xi, X2, • ■ • , x m ) T 



y and 



(c\,C2,- ■ -,c m ) such that for x=^ i 



have y ^ x and Jf(x) = Jff (y)) with probability p that 
is not negligible, we can use it to construct a polynomial time 
algorithm B that find a collision x and y in J$? with the same 
probability p which is not negligible. 

However, if DL[g,p, q] is hard, Lemma 2 show that the hash 
function Jff is collision free, and thus p should be negligible, 
which is a contradiction. Therefore, the basic schemes are 
secure if the discrete logarithm DL[g,p, q] is hard. ■ 

B. Security against the eavesdropping adversaries 

Theorem 2: Given a network that the number of indepen- 
dent message available to Calvin is less than the multicast 
capacity i.e.fc = maxiki < m. The algorithm in section-B 
achieves the practically secure condition at a probability of 
one when random code is used. 

Proof: In our algorithm, Alice transmits X instead of X, 
so the message available to Calvin is A^X. As long as Calvin 
doesn't get r\,T2, . . . ,r m from A^X, he can't get the global 
encoding kernel about X, and still can't get any meaningful 
information about X without the global encoding kernel about 
X. So by taking linear combinations of the observed packets 
A^X Calvin shouldn't be able to recover r±, . . . , r m which 
implies 

M-iX^/X^/M) (9) 

where bi is a hi x m matrix in F p ™ and J is an m X m 
identity matrix. 

Since the number of independent messages available to the 
eavesdropper is less than the multicast capacity of the network, 
the condition (9) can always be satisfied. ■ 

Moreover, Calvin can't get any packet of X by only getting 
the value r\ , r%, . . . , r m which implies 



6jAjP ^ I m<n (Vbi,n,i) 
Multiplying both sides by P _1 , we have 



(10) 



(11) 



Where /„ l n is the n th row of an m x m identity matrix and 
bi is a ki x m matrix in F p ™ . The above condition is satisfied 
if each row of P^ 1 is not in the row space of each A;. 
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Theorem 3: In a network that supports a multicast capacity 
of to, if at most k (k < to) edges can be tapped simulta- 
neously then the multicast capacity under practical security 
requirements is to — 2^ (Here, the asymptotically negligible 
term 2Hi corresponds to the overhead due to the redundancy 
Alice appends to X ). 

Proof: The network supports a multicast capacity of to so 
a linear code can be found to multicast to packets [3|. From 
theorem 2 a transformation at the source can be applied to 
make it practical security. ■ 

V. Discussion 

In this section, we examine the overhead and the start- 
up latency induced by our scheme. For fair comparisons, we 
choose n=410, \p\ = 320 and \q\ = 1024 in the follow 
discussion. The size of every packet is 16KB. What's more, 
we assume that the original file is divided into m=10 packets. 

The comparisons are shown in Fig. 3 

A. Communication Overhead 

The communication overhead is caused by two parts of 
parameters. The first part refers to the amount of data we need 
to distribute to each node for the security of our algorithm. The 
second part is the code coefficients. The actual communication 
overhead largely depends on the parameters chosen for the 
actual implementations. 

In the scheme proposed by Krohn J5), the parameters chosen 
for the homomorphic hash function would generate a hash 
value of size 1024 bits per packet. The total size of coefficients 
is 3200 bits per packet. Hence, the total size of the "first- 
order"hash values and the coefficients would be 3.22% of the 
original data. For a file of size 1 GB, their method would 
require hash values of size 8 MB. To distribute these hash 
values, the authors in Krohn |9| proposed to recursively apply 
the same scheme on the 8 MB hash values, which would 
generate more "second "or higher order of hash values. The 
size of the high order hash values constitutes 0.01% of the 
size of the original data. Hence the total overhead is 3.23%. 

In the scheme proposed by Zhao 1111 . if the file is divided 
into 10 packets, each packet is a vector in ¥ q . The size of 
each packet is also about 16KB. The size of each augmented 
vector (with coding vectors in the front) is about 16.4KB, 
and thus, the overhead of each packet is 2.43%. On the other 
hand, after the initial setup, the scheme of [ 1 1 ] has to publish 
3200 bits for the new signature vector for the security of their 
scheme. Thus the total overhead of their scheme is 4.86%. In 



conclusion, although they proposed a simple signature scheme, 
the communication overhead of their scheme is very high. 

The scheme in iflOl required padding of three values and 
they should also distribute the coefficients. The overhead 
caused by the coefficients themselves is 2.44%. Therefore the 
communication overhead of their scheme is higher than us, 
although they use the technical of batching verification. In our 
scheme, the coefficients are generated from a pseudo-random 
number generator in each node so the distribution of them 
is avoided. The communication overhead of our scheme is 
only caused by padding we add in every packet. Each packet 
distributed only incurs 0.48% overhead which is negligible 
compared with previous works. 

B. Start-Up Latency 

At the beginning of a content distribution session, the source 
and all the nodes participating in the distribution have to agree 
on the set of parameters used for the coding and verification. 
The public parameters in our scheme are p,q,go,gi,- ■ -,g n +i 
and the total size of the public parameters is approximately 
16.3 KB. With these parameters it would be sufficient for any 
node to perform verification. Assuming that the bandwidth 
between a node and the source (or any other node from which 
these parameters are distributed) is 1 Mbps, it would take 
less than 0.127 seconds before the node is ready to perform 
verification. The start-up latency in our scheme is fixed once 
the parameters for the hash function and the block size are 
chosen, and is independent of the size of the content to be 
distributed. The start-up latency of 1 1 ] is 0.127 seconds, more 
or less the same with us. For the scheme in (9), the size of 
all the public parameters is the same as the size of the data 
in a packet, which is 16 KB. It takes 0.125 seconds to be 
transmitted on the same link. However, when the node needs 
to receive 8 MB hash values of a 1 GB file as in the example 
given in Q, it would require 64 seconds, with the same 1 
Mbps link. The start-up latency is proportional with the size 
of the file. The public parameters of ifTTl consist of two parts: 
the public parameters and the signature vectors. The size of 
their public parameters is 32.8KB and it takes 0.25 seconds 
to be transmitted on the same link. Father more they have to 
publish new signature vectors for the security of their scheme 
in every setup. 

VI. Conclusion 

In this paper, we investigate the security issues that arise 
from using network coding and propose a secure algorithm. 
By means of our algorithm every node can verify the integrity 
of the received packets easily and an eavesdropper is unable 
to get any meaningful information about the source. We show 
that when we give up a small amount of overall capacity, the 
practically secure condition can be achieved at a probability 
of 1, which is much higher than that of [T]. We also propose a 
new paradigm where the public parameters are selected as the 
"intended"hash values and the code coefficients are generated 
in a pseudo-random number generator in every node. In this 
way the distribution of the hash values and the coefficients are 



avoided. We have shown that the communication overhead of 
our algorithm is ^, which is negligible compared with previous 
works and the start-up latency is transitory. 
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